Create Indexes for MySQL InnoDB Table from Python

Overview:

  • In the RDBMS parlance a MySQL table implements the concept of “relation”.
  • While a table can grow to as many number of rows as the limits imposed by the RDBMS implementation, the time taken to query one or more rows from a table should be minimized to the maximum extent possible.
  • The performance of searching one or more rows from a table is dependent on several factors. They include, how the data is distributed among various disk blocks, how much of that data is cached in the main memory, whether the search is performed using an index, if an index is employed what kind of index is used and the characteristics of that index and so on.

 

Example – A range query:

select * from books where price  > 10 and price < 20

 

  • Without any supplementary mechanisms, executing the above range query involves searching all the blocks and finding the records which satisfies the condition price  > 10 and price < 20. The search time is O (N) or linear.
  • When an Index is used, the time taken to execute a query is greatly reduced to accessing only few blocks of data.
  • Throughout this article the behavior explained is for the tables created under InnoDB of the MySQL.

MySQL InnoDB Indexes:

MySQL InnoDB has two kinds of indexes. They are

  • Clustering Indexes and
  • Secondary Indexes

Clustering Indexes:

  • A clustering index in MySQL serves as both table storage as well as an index on to the table. In other words, MySQL Clustering Index = Table Data + Index Data.
  • In some of the database management systems other than MySQL, the clustering index is also called as an Index-Organized Table (IOT).
  • A clustering index in MySQL is a B-Tree, where the nodes of the tree store the column values.
  • A clustering index in MySQL is built on the primary key of a MySQL table.
  • If a MySQL Table does not have a primary key defined on it, then a first unique index available is used as the clustering index.
  • If a table has no primary key and has no unique indexes defined on it then MySQL creates a clustering index based on a column rowid, which it generates internally and increases it monotonically with an integer value every time a new row is inserted.
  • There is only one clustering index for a table and the MySQL creates it automatically when a table is created.

 

Secondary Indexes:

  • A secondary index in MySQL is an index built on a non-primary key column of the table. However a record of the secondary index will have the columns specified for the index and the primary-key field. This way the secondary index is a subset of the clustered index. Having the primary-key in each record of the secondary index is important as it helps identifying the row.
  • Before going ahead with creating various type of indexes on a MySQL table from a Python Program, an understanding of the RDBMS concepts like Super Key, Key, Candidate key, Primary Key and Alternate Key will help in choosing what kind of index is required for a given table and in designing the primary key.
  • While there is only one clustering index for a MySQL table, there could be several secondary indexes defined as imposed by the limits of the implementation.

Selection of primary key and how it affects performance:

While the clustering index stores column values in a B-Tree against the primary key column, the secondary index stores only the index column values and the primary-key as a b-tree. Any change made to the primary-key column will be costly as the record has to be relocated in the clustering index and updates to all secondary indexes should also be made. Hence the primary key column of a MySQL table in InnoDB has to be selected after careful consideration.

 

Selection of Primary Key:

Super Key:     

  • A database relation or a database table will have one or more super keys.
  • A super key is a combination of column values, which will uniquely identify a row in a table.

Key:

  • When columns are removed from a super key to the point where any more removals will not make it a super key is called a key. A database table may have zero or more keys.

Candidate Keys:

  • Any valid key can be used to uniquely identify a row in a database table.
  • In other words, any valid key is a candidate key. A set of keys present in a table together is called candidate keys.

Primary Key:

  • A database designer can select any of the keys from a set of candidate keys as the Primary Key for the table.

Alternate Key:

  • An alternate key is one the key available from a set of candidate keys other than the key, which has been already designated or considered for as a primary key.

 

Creating indexes for MySQL InnoDB table using Python:

  • PyMySQL library is the Python MySQL client through which a Python program can connect to a MySQL server and perform the required database operation.
  • To create an index in a InnoDB Table first a database connection has to be obtained from a Python Program.
  • The connect() method of PyMySQL module accepts the IP address of the database and authentication credentials of the MySQL database server as parameters and returns a connection object to the server.
  • Using the connection object a Database cursor can be obtained through which any SQL command can be executed including the SQL commands that create clustering indexes and secondary indexes.

 

Example:

 

# import the mysql client for python

import pymysql

 

# Create a connection object

ipOctets            = "127.0.0.1"   # IP address of the MySQL database server

uName               = "root"        # User name on the db server

uPassword           = ""            # Password of the db user

characterSet        = "utf8mb4"     # The character set used

typeOfCursor        = pymysql.cursors.DictCursor

 

connection   = pymysql.connect(host=ipOctets, user=uName, password=uPassword,

                               charset=characterSet,cursorclass=typeOfCursor)

 

try:

    # Create a cursor object

    dbCursor        = connection.cursor()                                   

 

    # SQL Statement to create a table

    sqlCreateTable = """create table test.books(isbn varchar(13) not NULL,

                                         title varchar(128),

                                         pages int,

                                         author_first_name varchar(32),

                                         author_last_name varchar(32),

                                         PRIMARY KEY (isbn))"""

                                         

    # Execute the sql - Create table statement

    # MySQL automatically creates a Clustering index on the primary key

    dbCursor.execute(sqlCreateTable)

   

    # SQL Statement to create a secondary index   

    sqlCreateSecondaryIndex = """CREATE INDEX index_author_lastname ON test.books

                                 (author_last_name);"""

                                        

    # Execute the sql - create secondary index

    dbCursor.execute(sqlCreateSecondaryIndex)

   

    # Now list the indexes on the table

    sqlShowIndexes = "show index from test.books";

 

    dbCursor.execute(sqlShowIndexes)

    indexList = dbCursor.fetchall()

   

    # Printing the list of indexes on the table books

    print(indexList)

   

    

except Exception as e:

    print("Exception occured:{}".format(e))

 

finally:

    connection.close()

 

 

Output:

[{'Table': 'books', 'Non_unique': 0, 'Key_name': 'PRIMARY', 'Seq_in_index': 1, 'Column_name': 'isbn', 'Collation': 'A', 'Cardinality': 0, 'Sub_part': None, 'Packed': None, 'Null': '', 'Index_type': 'BTREE', 'Comment': '', 'Index_comment': ''}, {'Table': 'books', 'Non_unique': 1, 'Key_name': 'index_author_lastname', 'Seq_in_index': 1, 'Column_name': 'author_last_name', 'Collation': 'A', 'Cardinality': 0, 'Sub_part': None, 'Packed': None, 'Null': 'YES', 'Index_type': 'BTREE', 'Comment': '', 'Index_comment': ''}]


Copyright 2024 © pythontic.com